7 research outputs found
Learning Blind Motion Deblurring
As handheld video cameras are now commonplace and available in every
smartphone, images and videos can be recorded almost everywhere at anytime.
However, taking a quick shot frequently yields a blurry result due to unwanted
camera shake during recording or moving objects in the scene. Removing these
artifacts from the blurry recordings is a highly ill-posed problem as neither
the sharp image nor the motion blur kernel is known. Propagating information
between multiple consecutive blurry observations can help restore the desired
sharp image or video. Solutions for blind deconvolution based on neural
networks rely on a massive amount of ground-truth data which is hard to
acquire. In this work, we propose an efficient approach to produce a
significant amount of realistic training data and introduce a novel recurrent
network architecture to deblur frames taking temporal information into account,
which can efficiently handle arbitrary spatial and temporal input sizes. We
demonstrate the versatility of our approach in a comprehensive comparison on a
number of challening real-world examples.Comment: International Conference on Computer Vision (ICCV) (2017
Learning From Multi-Frame Data
Multi-frame data-driven methods bear the promise that aggregating multiple observations leads to better estimates of target quantities than a single (still) observation.
This thesis examines how data-driven approaches such as deep neural networks should be constructed to improve over single-frame-based counterparts.
Besides algorithmic changes, as for example in the design of artificial neural network architectures or the algorithm itself, such an examination is inextricably linked with the consideration of the synthesis of synthetic training data in meaningful size (even if no annotations are available) and quality (if real ground-truth acquisition is not possible), which capture all temporal effects with high fidelity.
We start with the introduction of a new algorithm to accelerate a nonparametric learning algorithm by using a GPU adapted implementation to search for the nearest neighbor.
While the approaches known so far are clearly surpassed, this empirically reveals that the data generated can be managed within a reasonable time and that several inputs can be processed in parallel even under hardware restrictions.
Based on a learning-based solution, we introduce a novel training protocol to bridge the need for carefully curated training data and demonstrate better performance and robustness than a non-parametric search for the nearest neighbor via temporal video alignments.
Effective learning in the absence of labels is required when dealing with larger amounts of data that are easy to capture but not feasible or at least costly to label.
In addition, we show new ways to generate plausible and realistic synthesized data and their inevitability when it comes to closing the gap to expensive and almost infeasible real-world acquisition.
These eventually achieve state-of-the-art results in classical image processing tasks such as reflection removal and video deblurring
Efficient Large-scale Approximate Nearest Neighbor Search on the GPU
We present a new approach for efficient approximate nearest neighbor (ANN)
search in high dimensional spaces, extending the idea of Product Quantization.
We propose a two-level product and vector quantization tree that reduces the
number of vector comparisons required during tree traversal. Our approach also
includes a novel highly parallelizable re-ranking method for candidate vectors
by efficiently reusing already computed intermediate values. Due to its small
memory footprint during traversal, the method lends itself to an efficient,
parallel GPU implementation. This Product Quantization Tree (PQT) approach
significantly outperforms recent state of the art methods for high dimensional
nearest neighbor queries on standard reference datasets. Ours is the first work
that demonstrates GPU performance superior to CPU performance on high
dimensional, large scale ANN problems in time-critical real-world applications,
like loop-closing in videos
GGNN: Graph-based GPU Nearest Neighbor Search
Approximate nearest neighbor (ANN) search in high dimensions is an integral
part of several computer vision systems and gains importance in deep learning
with explicit memory representations. Since PQT and FAISS started to leverage
the massive parallelism offered by GPUs, GPU-based implementations are a
crucial resource for today's state-of-the-art ANN methods. While most of these
methods allow for faster queries, less emphasis is devoted to accelerate the
construction of the underlying index structures. In this paper, we propose a
novel search structure based on nearest neighbor graphs and information
propagation on graphs. Our method is designed to take advantage of GPU
architectures to accelerate the hierarchical building of the index structure
and for performing the query. Empirical evaluation shows that GGNN
significantly surpasses the state-of-the-art GPU- and CPU-based systems in
terms of build-time, accuracy and search speed
Reconfigurable Inverted Index
Existing approximate nearest neighbor search systems suffer from two
fundamental problems that are of practical importance but have not received
sufficient attention from the research community. First, although existing
systems perform well for the whole database, it is difficult to run a search
over a subset of the database. Second, there has been no discussion concerning
the performance decrement after many items have been newly added to a system.
We develop a reconfigurable inverted index (Rii) to resolve these two issues.
Based on the standard IVFADC system, we design a data layout such that items
are stored linearly. This enables us to efficiently run a subset search by
switching the search method to a linear PQ scan if the size of a subset is
small. Owing to the linear layout, the data structure can be dynamically
adjusted after new items are added, maintaining the fast speed of the system.
Extensive comparisons show that Rii achieves a comparable performance with
state-of-the art systems such as Faiss.Comment: ACMMM 2018 (oral). Code: https://github.com/matsui528/ri
Flex-Convolution: Million-Scale Point-Cloud Learning Beyond Grid-Worlds
Traditional convolution layers are specifically designed to exploit the
natural data representation of images -- a fixed and regular grid. However,
unstructured data like 3D point clouds containing irregular neighborhoods
constantly breaks the grid-based data assumption. Therefore applying
best-practices and design choices from 2D-image learning methods towards
processing point clouds are not readily possible. In this work, we introduce a
natural generalization flex-convolution of the conventional convolution layer
along with an efficient GPU implementation. We demonstrate competitive
performance on rather small benchmark sets using fewer parameters and lower
memory consumption and obtain significant improvements on a million-scale
real-world dataset. Ours is the first which allows to efficiently process 7
million points concurrently.Comment: accepted at ACCV 201
Will People Like Your Image? Learning the Aesthetic Space
Rating how aesthetically pleasing an image appears is a highly complex matter
and depends on a large number of different visual factors. Previous work has
tackled the aesthetic rating problem by ranking on a 1-dimensional rating
scale, e.g., incorporating handcrafted attributes. In this paper, we propose a
rather general approach to automatically map aesthetic pleasingness with all
its complexity into an "aesthetic space" to allow for a highly fine-grained
resolution. In detail, making use of deep learning, our method directly learns
an encoding of a given image into this high-dimensional feature space
resembling visual aesthetics. Additionally to the mentioned visual factors,
differences in personal judgments have a large impact on the likeableness of a
photograph. Nowadays, online platforms allow users to "like" or favor certain
content with a single click. To incorporate a huge diversity of people, we make
use of such multi-user agreements and assemble a large data set of 380K images
(AROD) with associated meta information and derive a score to rate how visually
pleasing a given photo is. We validate our derived model of aesthetics in a
user study. Further, without any extra data labeling or handcrafted features,
we achieve state-of-the art accuracy on the AVA benchmark data set. Finally, as
our approach is able to predict the aesthetic quality of any arbitrary image or
video, we demonstrate our results on applications for resorting photo
collections, capturing the best shot on mobile devices and aesthetic key-frame
extraction from videos